Harnessing the lawless: using comparable corpora to find translation equivalents

نویسنده

  • Serge Sharoff
چکیده

Bilingual dictionaries provide basic translation equivalents for a headword and typically limit the set of equivalents to words of the same part of speech as the headword. However, words taken in their contexts can be translated in many more ways. At the same time, equivalents listed in dictionaries are not adequate in many contexts, because of the contextual and collocational sensitivity of target language expressions. The problem is particularly acute for novice translators who lack the experience for finding contextually-appropriate translations. The paper proposes a methodology for finding translation equivalents in comparable corpora. This helps in training translation students to be aware of the translation potential of polysemous words from the general lexicon.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using collocations from comparable corpora to find translation equivalents

In this paper we present a tool for finding appropriate translation equivalents for words from the general lexicon using comparable corpora. For a phrase in the source language the tool suggests a range of possible expressions used in similar contexts in target language corpora. In the paper we discuss the method and present results of human evaluation of the performance of the tool.

متن کامل

Adapted Seed Lexicon and Combined Bidirectional Similarity Measures for Translation Equivalent Extraction from Comparable Corpora

An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexicon—which is used to bridge contexts in different languages—is adapted to the corpora from which translation equivalents are to be extracted. Second, the contextual similarity is evaluated by ...

متن کامل

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

Generalising Lexical Translation Strategies for MT Using Comparable Corpora

We report on an on-going research project aimed at increasing the range of translation equivalents which can be automatically discovered by MT systems. The methodology is based on semi-supervised learning of indirect translation strategies from large comparable corpora and their application in run-time to generate novel, previously unseen translation equivalents. This approach is different from...

متن کامل

A Corpus-Based Study of zunshou and Its English Equivalents

This paper describes a corpus-based contrastive study of collocation in English and Chinese. In light of the corpus-based approach to identify functionally equivalent units, the present paper attempts to identify the collocational translation equivalents of zunshou by using a parallel corpus and two comparable corpora. This study shows that more often than not, we can find in English more than ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005